Beirut Governorate
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
'The end of the world as we know it': Is the rules-based order finished?
How much is US support for Israel costing Trump? What is a Palestinian without olives? Why are Gaza's homes collapsing in winter? 'The end of the world as we know it': Is the rules-based order finished? Canadian Prime Minister Mark Carney said the quiet part out loud at the World Economic Forum: what many call the global rules-based order was either collapsing or had collapsed already.
- North America > United States (1.00)
- North America > Canada (0.71)
- North America > Central America (0.41)
- (12 more...)
- Law > International Law (0.74)
- Government > Regional Government > North America Government > United States Government (0.49)
What next for Iran's Supreme Leader?
Iran's supreme leader, Ayatollah Ali Khamenei, in his secret hideout these days, knows he is now a marked man. He will not be sitting on his veranda anytime soon. When discussing what the United States might do next to help the protesters in Iran, US President Trump has mentioned Qassem Soleimani and Abu Bakr al-Baghdadi. The former, Iran's all-important military strategist in the Middle East, was killed on 3 January 2020 in a drone strike just outside Baghdad's international airport on the president's order. The latter, who was the leader of IS, killed himself and two children by detonating a suicide vest on 27 October 2019 when US forces raided his hideout in northern Syria after the approval of the president.
- Asia > Middle East > Iran (1.00)
- Europe > Middle East (0.25)
- Africa > Middle East (0.25)
- (27 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Regional Government > Asia Government > Middle East Government > Iran Government (1.00)
- Government > Military (1.00)
All the countries Israel attacked in 2025: Animated map
Why is Israel still in southern Lebanon? A war to shape Lebanon's future How many countries has Israel attacked in 2025? Israel has attacked more countries than any other country this year. In 2025, Israel attacked at least six countries, including Palestine, Iran, Lebanon, Qatar, Syria, and Yemen. It also carried out strikes in Tunisian, Maltese and Greek territorial waters on aid flotillas heading for Gaza.
- North America > United States (1.00)
- Asia > Middle East > Israel (1.00)
- Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.44)
- (10 more...)
- Government > Military (1.00)
- Government > Regional Government > Asia Government > Middle East Government (0.71)
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models
Zbeeb, Mohammad, Hammoud, Hasan Abed Al Kader, Mukalled, Sina, Rizk, Nadine, Karnib, Fatima, Lakkis, Issam, Mohanna, Ammar, Ghanem, Bernard
The benchmark spans five core categories: grammar, morphology, spelling, reading comprehension, and syntax, through 150 expert-designed multiple choice questions that directly assess structural language understanding. Evaluating 35 Arabic and bilingual LLMs reveals that current models demonstrate strong surface level proficiency but struggle with deeper grammatical and syntactic reasoning. AraLingBench highlights a persistent gap between high scores on knowledge-based benchmarks and true linguistic mastery, showing that many models succeed through memorization or pattern recognition rather than authentic comprehension. By isolating and measuring fundamental linguistic skills, AraLingBench provides a diagnostic framework for developing Arabic LLMs. The full evaluation code is publicly available on GitHub.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.05)
- (10 more...)
- Research Report (0.51)
- Questionnaire & Opinion Survey (0.34)
Intrusion Detection on Resource-Constrained IoT Devices with Hardware-Aware ML and DL
Diab, Ali, Chehade, Adel, Ragusa, Edoardo, Gastaldo, Paolo, Zunino, Rodolfo, Baghdadi, Amer, Rizk, Mostafa
Abstract--This paper proposes a hardware-aware intrusion detection system (IDS) for Internet of Things (IoT) and Industrial IoT (IIoT) networks; it targets scenarios where classification is essential for fast, privacy-preserving, and resource-efficient threat detection. The goal is to optimize both tree-based machine learning (ML) models and compact deep neural networks (DNNs) within strict edge-device constraints. This allows for a fair comparison and reveals trade-offs between model families. We apply constrained grid search for tree-based classifiers and hardware-aware neural architecture search (HW-NAS) for 1D convolutional neural networks (1D-CNNs). Evaluation on the Edge-IIoTset benchmark shows that selected models meet tight flash, RAM, and compute limits: LightGBM achieves 95.3% accuracy using 75 KB flash and 1.2 K operations, while the HW-NAS-optimized CNN reaches 97.2% with 190 KB flash and 840 K floating-point operations (FLOPs). We deploy the full pipeline on a Raspberry Pi 3 B+, confirming that tree-based models operate within 30 ms and that CNNs remain suitable when accuracy outweighs latency. The widespread deployment of Internet of Things (IoT) systems has expanded the attack surface of modern networks, which now include critical infrastructure and operational environments vulnerable to advanced cyber threats [1], [2].
- North America > United States (0.05)
- Europe > Italy (0.04)
- Europe > France > Brittany > Finistère > Brest (0.04)
- (2 more...)
Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection
Abstract--Few-shot prompting has emerged as a practical alternative to fine-tuning for leveraging the capabilities of large language models (LLMs) in specialized tasks. However, its effectiveness depends heavily on the selection and quality of in-context examples, particularly in complex domains. In this work, we examine retrieval-augmented prompting as a strategy to improve few-shot performance in code vulnerability detection, where the goal is to identify one or more security-relevant weaknesses present in a given code snippet from a predefined set of vulnerability categories. We perform a systematic evaluation using the Gemini-1.5-Flash Our results show that retrieval-augmented prompting consistently outperforms the other prompting strategies. At 20 shots, it achieves an F1 score of 74.05% and a partial match accuracy of 83.90%. We further compare this approach against zero-shot prompting and several fine-tuned models, including Gemini-1.5-Flash Retrieval-augmented prompting outperforms both zero-shot (F1 score: 36.35%, On the other hand, fine-tuning CodeBERT yields higher performance (F1 score: 91.22%, partial match accuracy: 91.30%) but requires additional training, maintenance effort, and resources.
- South America > Venezuela (0.05)
- North America > United States > West Virginia (0.05)
- North America > Canada > Quebec > Montreal (0.05)
- (5 more...)
- Leisure & Entertainment > Sports (1.00)
- Banking & Finance (1.00)
- Law (0.98)
- (5 more...)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
- Information Technology > Communications > Social Media (0.75)
Constructing Political Coordinates: Aggregating Over the Opposition for Diverse News Recommendation
Earl, Eamon, Ding, Chen, Valenzano, Richard, Paulen-Patterson, Drai
Abstract--In the past two decades, open access to news and information has increased rapidly, empowering educated political growth within democratic societies. News recommender systems (NRSs) have shown to be useful in this process, minimizing political disengagement and information overload by providing individuals with articles on topics that matter to them. Unfortunately, NRSs often conflate underlying user interest with the partisan bias of the articles in their reading history and with the most popular biases present in the coverage of their favored topics. Over extended interaction, this can result in the formation of filter bubbles and the polarization of user partisanship. In this paper, we propose a novel embedding space called Constructed Political Coordinates (CPC), which models the political partisanship of users over a given topic-space, relative to a larger sample population. We apply a simple collaborative filtering (CF) framework using CPC-based correlation to recommend articles sourced from oppositional users, who have different biases from the user in question. We compare against classical CF methods and find that CPC-based methods promote pointed bias diversity and better match the true political tolerance of users, while classical methods implicitly exploit biases to maximize interaction. Recommender system (RS) utility has two main value measurements: users seeing content that they engage positively with, and the content providers maximizing engagement with their content or platform. While the two are evidently correlated (i.e. a user who is not properly catered to will likely cease to use the platform), the latter provides motivation for recommendation algorithms to shift a user's preferences to make them easier to cater to, resulting in higher expectations of long-term engagement [1]. Previous research [2] on the relationship between recom-mender systems and American political typology suggests that users with more extreme political preferences exhibit higher engagement metrics with their recommended news. Additionally, it was found that their engagement can be maximized by recommending articles among which a dominant percentage express a singular partisan bias. This establishes an implicit incentive for a News Recommender System (NRS) to shift user preferences toward political extremes through selection bias, particularly in long-term value systems or those leveraging popularity [1]. This phenomenon results in the formation of filter bubbles, where users are eventually shown only perspectives in their news which comply with their preexisting opinions, and users with heterogeneous partisanship over distinct topics have their political ideology homogenized over time.
- North America > United States (1.00)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Syria (0.04)
- (3 more...)
- Research Report (0.50)
- Overview (0.34)
- Government > Regional Government > North America Government > United States Government (1.00)
- Health & Medicine > Therapeutic Area (0.68)
- Media (0.68)
- Government > Military (0.67)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (0.93)